stream: experimental stream/iter implementation by jasnell · Pull Request #62066 · nodejs/node

jasnell · 2026-03-01T18:37:08Z

Opening this for discussion. Not intending to land this yet. It adds an implementation of the "new streams" to core and adds support to FileHandle with tests and benchmarks just to explore implementation feasibility, performance, etc.

This is an implementation of the "new streams" API for Node.js along with an example integration with FileHandle. This covers the core part of the implementation.

The module is stream/iter. It is gated behind the --experimental-stream-iter CLI flag.

Benchmark results comparing Node.js streams, Web streams, and stream/iter (higher number is better)

Benchmark	classic	webstream	iter	iter-sync	iter vs classic
Identity 1MB	1,245	582	3,110	16,658	2.5x
Identity 64MB	31,410	14,980	33,894	62,111	1.1x
Transform 1MB	287	227	325	327	1.1x
Transform 64MB	595	605	605	573	1.0x
Compression 1MB	123	98	110	--	0.9x
Compression 64MB	329	303	308	--	0.9x
pipeTo 1MB	1,137	494	2,740	13,611	2.4x
pipeTo 64MB	22,081	15,377	30,036	60,976	1.4x
Broadcast 1c 1MB	1,365	521	1,991	--	1.5x
Broadcast 2c 1MB	1,285	439	1,962	--	1.5x
Broadcast 4c 1MB	1,217	322	750	--	0.6x
File read 16MB	1,469	537	1,639	--	1.1x

It's worth noting that the performance of the FileHandle benchmarked added, that reads files, converts them to upper case and then compresses them, is on par with node.js streams and twice as fast as web streams. (tho... web streams are not perf optimized in any way so take that 2x with a grain of salt). The majority of the perf cost in the benchmark is due to compression overhead. Without the compression transform, the new stream can be up to 15% faster than reading the file with classic node.js streams.

The main thing this shows is that the new streams impl can (a) perform reasonably and (b) sit comfortably alongside the existing impls without any backwards compat concerns.

Benchmark runs:

fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.4520276595366672
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5974527572097321
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6425952035725405
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.1911778984563999
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2179878501077266
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2446390516960688
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.5118129753083176
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.6280697056085692
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.596177892010514
--- 
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="classic": 0.44890689503274533
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="classic": 0.5922959407897667
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="classic": 0.6151916200977057
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="webstream": 0.22796906713941217
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="webstream": 0.2517499148269662
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="webstream": 0.2613608248108332
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=1048576 api="pull": 0.4725187688512099
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=16777216 api="pull": 0.5180217625521253
fs/bench-filehandle-pull-vs-webstream.js n=5 filesize=67108864 api="pull": 0.616770183722841

Opencode/Opus 4.6 were leveraged heavily in the process of creating this PR following a strict iterative jasnell-in-the-loop process.

--
Reviewing Guide

The draft spec this is implementing is located at https://stream-iter.jasnell.me/

The implementation is primarily in lib/internal/streams/iter ... that's where you should start. The functionality is split between key files by operation, which should make it easier to review.

The tests are in parallel prefixed as test-stream-iter-*, they are organized also by functional area.

The are benchmarks in bench/streams prefixed with iter-*.

nodejs-github-bot · 2026-03-01T18:37:12Z

Review requested:

@nodejs/performance
@nodejs/streams

doc/api/fs.md

ronag

Super impressed! This is amazing.

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

ronag · 2026-03-02T11:14:38Z

Also could you do some mitata based benchmarks so that we can see the gc and memory pressure relative to node streams?

ronag · 2026-03-02T11:16:32Z

Another thing, in the async generator case, can we pass an optional AbortSignal? i.e. async function * (src, { signal }). We maybe could even check the function signature and if it doesn't take a second parameter don't allocate the abort controller at all.

jasnell · 2026-03-02T18:11:21Z

One note. Since this is supposed to be "web compatible" it looks to me like everything is based on Uint8Array which is a bit unfortunate for Node. Could the node implementation use Buffer it would still be compatible it's just that we can access the Buffer prototype methods without doing hacks like Buffer.prototype.write.call(...).

This makes me a bit nervous for code portability. If some one starts working with this in node.js, they would end up writing code that depends on the values being Buffer and not just Uint8Array. They go to move that to another runtime implementation or standalone impl like https://github.com/jasnell/new-streams and suddenly that assumption breaks.

benjamingr

just to explore implementation feasibility, performance, etc

Sounds fine as this isn't exposed outside at the time

benjamingr · 2026-03-02T18:34:54Z

lib/internal/streams/iter/push.js

+
+    // Buffer is full
+    switch (this._backpressure) {
+      case 'strict':


I'm not sure strict should be the default and not block here.

That'll be a big part of the discussion around this. A big part of the challenge with web streams is that backpressure can be fully ignored. One of the design principles for this new approach is to apply it strictly by default. We'll need to debate this. Recommend opening an issue at https://github.com/jasnell/new-streams

lib/internal/streams/new/push.js

lib/internal/streams/iter/push.js

lib/internal/streams/iter/pull.js

benjamingr

sorry meant to approve, regardless of design changes/suggestions regarding timing and a lot of other stuff as experimental this is fine.

I would maybe update the docs to emphasize the experimentality even further than normal

jasnell · 2026-03-03T05:35:39Z

@ronag ... implemented a couple of mitata benchmarks in the https://github.com/jasnell/new-streams repo (the reference impl)

--

Memory Benchmark Results

Environment: Node 25.6.0, Intel Xeon w9-3575X, --expose-gc, mitata with .gc('inner')

Per-Operation Allocations (New Streams vs Web Streams)

Scenario	Speed	Heap/iter (new)	Heap/iter (web)
Push write/read (1K x 4KB)	2.24x faster	2.06 MB	1.43 MB
Pull + transform (1K x 4KB)	2.44x faster	334 KB	5.57 MB
pipeTo + transform (1K x 4KB)	3.15x faster	303 KB	7.47 MB
Broadcast 2 consumers (500 x 4KB)	1.04x faster	1.92 MB	1.81 MB
Large pull 40MB (10K x 4KB)	1.26x faster	2.62 MB	52.35 MB

Pipeline scenarios (pull, pipeTo) show the biggest gains: 16-25x less heap because transforms are inline function calls, not stream-to-stream pipes with internal queues. Push is faster but uses slightly more heap due to batch iteration (Uint8Array[]). Broadcast/tee are comparable at this scale.

Sustained Load (97.7 MB volume)

Scenario	Peak Heap (new)	Peak Heap (web stream)
pipeTo + transform	6.9 MB	50.6 MB
Broadcast 2 consumers	0.5 MB	42.8 MB
Push write/read	5.9 MB	2.5 MB
Pull + transform	6.1 MB	2.8 MB

pipeTo and broadcast show the largest sustained-load heap difference. Web Streams' pipeThrough chain buffers ~50% of total volume in flight; new streams' pipeTo pulls synchronously through the transform. Broadcast's shared ring buffer (0.5 MB) vs tee's per-branch queues (42.8 MB).

Zero retained memory for both APIs after completion -- no leaks.

jedwards1211 · 2026-03-03T06:35:24Z

@ronag passing a signal to an async generator allows the underlying source to abort it, but we're lacking a builtin way for the consumer iterating the async generator to safely cancel the stream. It can .return() its iterator when it's done, but that won't break the async generator and until it receives the next chunk, which isn't guaranteed to happen if the underlying source is something nondeterministic like pubsub events. In this case, there would be leaks that are kind of awkward to blame on user error.

Barring an improvement at the language level, the consumer can only safely cancel the underlying source if it has a reference to an AbortController that signals it.

WHATWG Streams don't have this problem if the consumer .cancel()s their reader, though they do if the consumer is async iterating them.

Happy to create examples to reproduce this if it's not clear what I'm talking about.

ronag · 2026-03-03T07:32:49Z

I think you misunderstand. The signal would be for any async calls inside the generator.

jedwards1211 · 2026-03-03T08:26:04Z

Yes, I'm just saying that doesn't allow the consumer to abort calls the async generator is making, but the consumer often decides when streaming should be aborted.

For example say I'm using a library that handles subscriptions from the frontend. When it gets a subscription it asks me to build an async iterable of events to stream back. Then it's responsible for iterating, then cancelling once the frontend unsubscribes. If the iterable I pass to that library is from an async generator, I'll have to also pass an AbortController to that library for it to safely clean up once the client unsubscribes. If all it has is an AsyncIterable interface, it may leak resources after the client unsubscribes.

This is a fundamental weakness in using async generators for transformation and my longtime frustration with async iteration in general.

In contrast, with WHATWG streams, when a consumer cancels its reader, the underlying source and any TransformStreams and get notified to clean up right away.

jedwards1211 · 2026-03-03T08:47:23Z

@benjamingr was actually talking about the same thing I'm trying to resurrect awareness of in this old issue in the async-iteration proposal

Note one of his comments: tc39/proposal-async-iteration#126 (comment)

This was eight years ago but there hasn't been much improvement on this front, unfortunately.

I'm really hoping I can get everyone to fully understand this pitfall and have a good plan for how to help people avoid it before getting too far along with this new proposed API.

Refactors the cancelation per updates in the design doc

jasnell · 2026-03-18T20:03:01Z

I've updated the implementation to address the remaining outstanding issues, round out tests, add benchmarks, fix bugs, etc. It's also now behind an experimental cli flag.

This is ready for review.

nodejs-github-bot · 2026-03-19T00:44:00Z

CI: https://ci.nodejs.org/job/node-test-pull-request/71881/

codecov · 2026-03-19T05:09:50Z

Codecov Report

❌ Patch coverage is 89.66399% with 606 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.66%. Comparing base (9fc6b64) to head (31d5558).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/internal/streams/iter/pull.js	83.24%	146 Missing and 6 partials ⚠️
lib/internal/streams/iter/broadcast.js	84.64%	114 Missing and 3 partials ⚠️
lib/internal/streams/iter/share.js	84.17%	101 Missing and 2 partials ⚠️
lib/internal/streams/iter/from.js	89.08%	63 Missing ⚠️
lib/internal/streams/iter/push.js	91.40%	59 Missing and 3 partials ⚠️
lib/internal/fs/promises.js	81.37%	54 Missing ⚠️
lib/internal/streams/iter/consumers.js	96.55%	18 Missing ⚠️
lib/internal/streams/iter/ringbuffer.js	88.74%	17 Missing ⚠️
lib/internal/streams/iter/transform.js	97.35%	13 Missing and 2 partials ⚠️
lib/internal/streams/iter/duplex.js	96.45%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #62066      +/-   ##
==========================================
- Coverage   89.68%   89.66%   -0.02%     
==========================================
  Files         676      688      +12     
  Lines      206575   212553    +5978     
  Branches    39549    40713    +1164     
==========================================
+ Hits       185262   190592    +5330     
- Misses      13446    14080     +634     
- Partials     7867     7881      +14

Files with missing lines	Coverage Δ
lib/internal/bootstrap/realm.js	`96.21% <100.00%> (ø)`
lib/internal/process/pre_execution.js	`97.47% <100.00%> (+0.54%)`	⬆️
lib/internal/streams/iter/types.js	`100.00% <100.00%> (ø)`
lib/internal/streams/iter/utils.js	`100.00% <100.00%> (ø)`
lib/stream/iter.js	`100.00% <100.00%> (ø)`
src/node_builtins.cc	`76.00% <100.00%> (-0.15%)`	⬇️
src/node_options.cc	`76.47% <100.00%> (+0.02%)`	⬆️
src/node_options.h	`97.94% <100.00%> (+0.01%)`	⬆️
lib/internal/streams/iter/duplex.js	`96.45% <96.45%> (ø)`
lib/internal/streams/iter/transform.js	`97.35% <97.35%> (ø)`
... and 8 more

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jasnell · 2026-03-20T21:43:42Z

Performed some memory profiling comparing stream/iter with "classic" node.js streams. This is based on the current iteration of stream/iter at the time this comment is posted ... expand the details block for the information.

Details

Methodology

Six benchmarks comparing classic Node.js streams (stream.Readable/stream.Writable/pipeline) against the new stream/iter API across representative usage patterns. All benchmarks run with --expose-gc for forced GC before/after measurement, and use PerformanceObserver to capture GC event counts, types, and pause durations.

Each benchmark runs warmup iterations first, then measured iterations with memory snapshots (via process.memoryUsage() and v8.getHeapStatistics()) taken before and after the measured window.

Benchmark Results

1. Simple Pipe-Through

Scenario: 64MB of 64KB chunks piped from source to no-op sink. 10 iterations.
Measures baseline piping overhead without transforms.

Metric	Classic	Iter	Iter-Sync	Iter vs Classic
Time	20.6ms	9.8ms	3.6ms	2.1x faster
Heap delta	104.1 KB	58.8 KB	30.1 KB	44% less
RSS delta	1.38 MB	704 KB	0 B	49% less
GC events	2	13	2	More minor GCs
GC pause	3.26ms	3.48ms	2.26ms	Similar

Analysis: Iter uses less heap per run due to the absence of ReadableState, WritableState, _events, and EventEmitter infrastructure. The higher minor GC count for iter-async reflects more frequent short-lived promise allocations from the async generator protocol, but each is cheap and efficiently collected by Scavenge. Iter-sync eliminates nearly all allocation since generators produce no promises.

2. SSR-Type Concurrent Streams

Scenario: 100 concurrent streams, each producing 100KB through a buffer-copy transform with 4KB chunks. 5 iterations. Simulates server-side rendering workloads.

Metric	Classic	Iter	Iter vs Classic
Time	87.2ms	33.2ms	2.6x faster
Heap delta	16.1 KB	-10 KB	Neutral
RSS delta	25.44 MB	6.88 MB	73% less RSS
GC events	6	3	50% fewer
GC pause	9.03ms	4.18ms	54% less

Analysis: The most dramatic result. Classic streams allocate approximately 25 objects per stream (Readable + Writable + ReadableState + WritableState + _events x 2 + pipeline scaffolding + end-of-stream listeners + closures). With 100 concurrent streams x 5 iterations = 500 stream lifecycles, that is roughly 12,500 objects from infrastructure alone. Iter avoids all of this as generators and plain objects have near-zero construction overhead. The 73% RSS reduction is significant for server workloads.

3. Backpressure

Scenario: 8MB of 16KB chunks with a slow consumer that delays every 8th write by 1ms. 5 iterations. Tests buffer memory growth under sustained pressure.

Metric	Classic	Iter-Push	Iter-Pull
Time	348.7ms	361.3ms	2.3ms
Heap delta	133.1 KB	7.7 KB	17.5 KB
RSS delta	0 B	0 B	0 B
GC events	2	3	4
GC pause	3.70ms	5.46ms	2.76ms

Analysis: Under backpressure, classic streams buffer {chunk, encoding, callback} objects per write in the Writable's internal buffer -- 133KB of heap growth. Iter-push uses a RingBuffer with just the chunk reference -- 7.7KB, 95% less buffer overhead. The pull model avoids buffering entirely because the source only yields when the consumer requests data.

The iter-pull result (2.3ms) reflects the writeSync try-fallback pattern: 7 out of 8 writes complete synchronously, entirely avoiding the setTimeout delay. This is a valid demonstration of the pull model's advantage -- the sync fast path avoids async overhead when the writer can handle data synchronously.

4. Many Short-Lived Streams

Scenario: 10,000 streams, each producing 4KB (4 x 1KB chunks). 3 iterations. Measures per-stream construction/teardown overhead and GC pressure from rapid allocation and deallocation.

Metric	Classic	Iter	Iter-Sync	Iter vs Classic
Time	468.7ms	110.6ms	81.7ms	4.2x faster
Heap delta	68.7 KB	75.8 KB	414.4 KB	Similar
RSS delta	20.54 MB	-504 KB	1.38 MB	Dramatically less
GC events	23	8	3	65% fewer
GC pause	87.73ms	8.04ms	9.35ms	91% less

Analysis: The most revealing benchmark for construction/teardown overhead. Classic streams at 10,000 stream lifecycles triggers 23 GC events totaling 87.73ms -- 18.7% of total runtime is GC. Each classic stream creates approximately 25 objects (Readable, Writable, two States, _events, pipeline closures, end-of-stream listeners). Over 10K x 3 iterations x 25 = roughly 750,000 objects created and immediately abandoned. Iter creates a generator + plain object per stream, perhaps 3-4 objects. GC pause drops from 87.73ms to 8.04ms.

5. Deep Transform Chain

Scenario: 16MB of 16KB chunks through 5 identity transforms to a no-op sink. 5 iterations. Measures per-transform memory overhead.

Metric	Classic	Iter	Iter-Sync	Iter vs Classic
Time	14.9ms	10.6ms	3.5ms	1.4x faster
Heap delta	77.0 KB	25.4 KB	-32 B	67% less
RSS delta	1.38 MB	0 B	0 B	No RSS growth
GC events	6	9	1
GC pause	4.88ms	3.29ms	2.32ms	33-52% less

Analysis: Each classic Transform is a full Duplex stream (Readable + Writable + both States + _events + pipeline wiring) -- 5 transforms means roughly 125 infrastructure objects. Iter fuses all 5 stateless transforms into a single generator layer -- one additional generator frame regardless of depth. The sync path achieves zero net heap growth.

6. Fan-Out

Scenario: 1 source producing 16MB of 16KB chunks consumed by 4 readers. 5 iterations. Classic uses pipe() to PassThrough streams; iter uses broadcast.

Metric	Classic	Broadcast	Iter vs Classic
Time	9.6ms	11.6ms	21% slower
Heap delta	132.8 KB	14.7 KB	89% less
RSS delta	2.06 MB	704 KB	66% less
GC events	5	9	More minor GCs
GC pause	2.81ms	2.90ms	Similar

Analysis: Fan-out is the one scenario where classic streams are faster. pipe() to multiple PassThrough streams is highly optimized in Node.js. However, broadcast uses dramatically less memory: 14.7KB vs 132.8KB heap delta. Classic creates 4 PassThrough streams (each a full Duplex) + pipe wiring + end-of-stream listeners. Broadcast shares a single RingBuffer with cursor-based consumers where each consumer is just a {cursor, resolve, reject, detached} state object.

Structural Comparison

Per-Stream Construction Cost

Component	Classic Streams	stream/iter
Stream object	Readable/Writable instance	Generator function (no object)
State tracking	ReadableState + WritableState (~40 fields each)	Closure variables (zero objects)
Event system	`_events` object + listener arrays	None
Buffer	Array + index + compaction logic	RingBuffer (power-of-2 backing array)
Pipeline wiring	AbortController + eos listeners + destroyer closures (15-20 objects)	`for await` loop (0 objects)
Backpressure state	`awaitDrainWriters` + drain listeners + cork/uncork	RingBuffer capacity check (0 objects)
Total per stream	~25-30 heap objects	~3-5 heap objects

Per-Chunk/Per-Batch Hot Path Cost

Cost	Classic	stream/iter
Iterator result	N/A (push model)	1 `{done, value}` per batch
Promise per chunk	0 (callback-based)	1 per batch (async gen yield)
Backpressure buffering	1 `{chunk, enc, cb}` object	0 (pull model) or 1 chunk ref (push)
Write completion	`afterWriteTickInfo` batching	`kResolvedPromise` cached (0 alloc)
Batch amortization	1 chunk per event loop tick	N chunks per batch (configurable)

GC Impact Model

The fundamental difference: classic streams create many long-lived objects that survive young generation collection (State objects, _events, listener closures live for the stream's entire lifetime). This promotes them to old space, requiring expensive Mark-Sweep collection.

Stream/iter creates mostly short-lived objects (iterator results, promises) that die within one async tick and are efficiently collected by Scavenge (minor GC).

For high-churn scenarios (many short-lived streams), classic streams create and abandon ~25 objects per stream lifecycle that must all be traced and collected. Iter creates ~3-5 objects. The GC impact scales linearly with stream count -- at 10K streams, classic spends 88ms in GC vs iter's 8ms.

Summary

Scenario	Time Winner	Memory Winner	GC Winner
Simple pipe-through	iter (2.1x)	iter (44% less)	Comparable
SSR concurrent (100 streams)	iter (2.6x)	iter (73% less RSS)	iter (54% less)
Backpressure	pull (150x)	push (95% less)	Comparable
Many short streams (10K)	iter (4.2x)	iter (less RSS)	iter (91% less)
Deep transforms (5x)	iter (1.4x)	iter (67% less)	iter (33% less)
Fan-out (4 consumers)	classic (1.2x)	broadcast (89% less)	Comparable

Stream/iter is consistently more memory-efficient across all scenarios, with the advantage most pronounced in high-concurrency and high-churn workloads. The only throughput concession is fan-out, where classic's highly-optimized pipe() path is slightly faster -- but even there, broadcast uses dramatically less memory.

The pull model's inherent backpressure (source only yields on demand) eliminates buffer-related memory growth entirely for the most common use pattern. The batch-oriented design (Uint8Array[] rather than individual chunks) amortizes the per-item overhead of Promises and iterator results across all chunks in a batch, making the async iteration protocol overhead negligible at typical chunk sizes.

jasnell · 2026-03-20T22:03:17Z

Similar comparison with stream/iter to web streams (noting that our web streams impl has never been fully optimized).

Details

Methodology

Six benchmarks comparing the Web Streams API (ReadableStream/WritableStream/TransformStream/pipeTo/pipeThrough/tee) against the new stream/iter API across representative usage patterns. All benchmarks run with --expose-gc for forced GC before/after measurement, and use PerformanceObserver to capture GC event counts, types, and pause durations.

Each benchmark runs warmup iterations first, then measured iterations with memory snapshots (via process.memoryUsage() and v8.getHeapStatistics()) taken before and after the measured window.

Benchmark Results

1. Simple Pipe-Through

Scenario: 64MB of 64KB chunks piped from source to no-op sink. 10 iterations.

Metric	Web Streams	Iter	Iter-Sync	Iter vs WS
Time	28.8ms	6.6ms	3.3ms	4.4x faster
Heap delta	142.6 KB	13.8 KB	30.5 KB	90% less
GC events	31	7	1	77% fewer
GC pause	4.01ms	3.06ms	2.20ms	24% less
Minor GC	30	6	0	80% fewer

Analysis: Web Streams trigger 30 minor GCs for the same data volume -- one roughly every 21MB of throughput. Each pipeTo iteration creates internal reader/writer pairs, promise-per-chunk from the pull protocol, and controller state. Iter-async needs only 6 minor GCs (one promise per batch from the async generator). Iter-sync eliminates minor GC entirely. The 90% heap reduction reflects iter's absence of ReadableStreamDefaultController, WritableStreamDefaultController, ReadableStreamDefaultReader, internal queuing strategy objects, and the per-chunk promise overhead of the Web Streams pull protocol.

2. SSR-Type Concurrent Streams

Scenario: 100 concurrent streams, each producing 100KB through a buffer-copy transform with 4KB chunks. 5 iterations.

Metric	Web Streams	Iter	Iter vs WS
Time	132.7ms	26.7ms	5.0x faster
Heap delta	191.1 KB	50.8 KB	73% less
RSS delta	30.01 MB	0 B	No RSS growth
GC events	10	3	70% fewer
GC pause	9.55ms	3.48ms	64% less

Analysis: The most dramatic result. At 100 concurrent streams, Web Streams consume 30MB of RSS growth while iter shows none. Each Web Streams pipeline creates ReadableStream + TransformStream + WritableStream, each with their own controllers, internal queues, [[readableStreamController]] / [[writableStreamDefaultWriter]] internal slots, strategy objects, and promise machinery. That is approximately 30-40 objects per stream. 100 streams x 5 iterations = 500 lifecycles producing roughly 15,000-20,000 infrastructure objects. Iter produces a generator + plain objects per stream -- approximately 3-5 objects.

3. Backpressure

Scenario: 8MB of 16KB chunks with a slow consumer that delays every 8th write by 1ms. 5 iterations.

Metric	Web Streams	Iter-Push	Iter-Pull
Time	351.1ms	366.2ms	2.7ms
Heap delta	28.1 KB	21.4 KB	6.4 KB
RSS delta	6.19 MB	0 B	1.38 MB
GC events	10	3	2
GC pause	3.83ms	4.04ms	2.22ms
Minor GC	9	2	1

Analysis: Under backpressure with equivalent delay patterns, Web Streams and iter-push perform similarly on time (both dominated by the 1ms delays). However, Web Streams show 6.19MB RSS growth vs zero for iter-push. Web Streams' internal queuing strategy allocates queue entries with { value, size } wrappers and maintains separate [[queue]] arrays on both the readable and writable sides. Iter-push uses a single RingBuffer with direct chunk references.

The pull model (2.7ms) demonstrates its structural advantage: the writeSync try-fallback pattern means 7/8 writes complete synchronously, entirely skipping the delay. The Web Streams pull protocol has no sync fast path -- every controller.enqueue() / reader.read() goes through the promise-based pull protocol.

4. Many Short-Lived Streams

Scenario: 10,000 streams, each producing 4KB (4 x 1KB chunks). 3 iterations.

Metric	Web Streams	Iter	Iter-Sync	Iter vs WS
Time	326.1ms	98.9ms	89.2ms	3.3x faster
Heap delta	371.5 KB	99.0 KB	424.7 KB	73% less (async)
GC events	268	36	5	87% fewer
GC pause	20.65ms	11.05ms	12.04ms	47% less
Minor GC	267	35	4	87% fewer

Analysis: The GC pressure difference is staggering. Web Streams trigger 267 minor GCs for 30,000 stream lifecycles (10K x 3 iterations) -- nearly one Scavenge per 112 streams. Each ReadableStream + WritableStream pair creates controllers, internal slot objects, queuing strategy instances, and the pipeTo algorithm creates ReadableStreamDefaultReader + WritableStreamDefaultWriter with their own promise slots. At approximately 30+ objects per stream pair, that is 900,000+ objects created and abandoned.

Iter-async triggers only 36 minor GCs (87% fewer) because each stream lifecycle creates roughly 3-5 objects (generator, iterator object, argument parsing result). The GC pause time difference (20.65ms vs 11.05ms) means Web Streams spend 6.3% of total runtime in GC vs iter's 11.2%. While iter's GC percentage is higher, its total runtime is 3.3x shorter, so the absolute time in GC is still lower.

5. Deep Transform Chain

Scenario: 16MB of 16KB chunks through 5 identity transforms to a no-op sink. 5 iterations.

Metric	Web Streams	Iter	Iter-Sync	Iter vs WS
Time	69.8ms	9.6ms	2.9ms	7.3x faster
Heap delta	190.4 KB	-53 KB	30.1 KB	Net negative
GC events	60	9	1	85% fewer
GC pause	8.95ms	3.28ms	2.70ms	63% less
Minor GC	59	8	0	86% fewer

Analysis: The widest performance gap. Each Web Streams TransformStream creates a ReadableStream + WritableStream pair internally (with all their controllers and internal slots), plus the transform controller. 5 transforms = 5 full stream pairs = approximately 150+ infrastructure objects, and every chunk passes through 5 promise-based pull cycles. The pipeThrough chain creates 5 pipeTo algorithms running concurrently, each with its own reader/writer pair.

Iter fuses all 5 stateless transforms into a single generator layer. One additional generator frame regardless of transform depth. The per-chunk cost is 5 function calls (the transforms) with no additional promise or object creation. 59 minor GCs for Web Streams vs 8 for iter reflects the massive object creation difference.

Iter-sync achieves 2.9ms with zero minor GC -- the entire pipeline runs as synchronous function calls through a single generator.

6. Fan-Out

Scenario: 1 source producing 16MB of 16KB chunks consumed by 4 readers. 5 iterations.

Metric	Web Streams	Broadcast	Iter vs WS
Time	35.3ms	11.3ms	3.1x faster
Heap delta	124.3 KB	3.5 KB	97% less
RSS delta	3.44 MB	704 KB	80% less
GC events	41	9	78% fewer
GC pause	4.98ms	2.81ms	44% less
Minor GC	40	8	80% fewer

Analysis: Web Streams' tee() creates a full branch of the stream for each split. To get 4 consumers from tee(), two levels of teeing are needed (rs.tee() then tee each branch), creating 6 ReadableStream instances total with their controllers, internal queues, and per-chunk promise resolution. Each tee branch independently copies chunk references and maintains separate queue state.

Broadcast shares a single RingBuffer across all consumers. Each consumer is a {cursor, resolve, reject, detached} state object -- 4 objects vs Web Streams' 6 full stream instances with controllers. The 97% heap reduction reflects this fundamental architectural difference: shared buffer with cursors vs independent queue copies.

Structural Comparison

Per-Stream Construction Cost

Component	Web Streams	stream/iter
Readable side	`ReadableStream` + `ReadableStreamDefaultController` + strategy + `[[queue]]`	Generator function (no object)
Writable side	`WritableStream` + `WritableStreamDefaultController` + strategy + `[[queue]]`	Plain writer object (user-provided)
Pipe connection	`ReadableStreamDefaultReader` + `WritableStreamDefaultWriter` + promise slots	`for await` loop (0 objects)
Transform	`TransformStream` = RS + WS + `TransformStreamDefaultController`	Single function reference
Queuing strategy	`CountQueuingStrategy` or `ByteLengthQueuingStrategy` instance	Integer HWM on RingBuffer
Backpressure tracking	`[[backpressure]]` slot + `desiredSize` on controller + promise-based signaling	RingBuffer capacity check
Total per pipeline	~30-40 heap objects	~3-5 heap objects

Per-Chunk Protocol Cost

Cost	Web Streams	stream/iter
Read protocol	`reader.read()` returns Promise wrapping `{done, value}`	`for await` gets batch from generator
Write protocol	`writer.write()` returns Promise, queued internally	`writeSync()` returns boolean (0 alloc)
Backpressure signal	Promise-based: `writer.ready` resolves when space available	Sync: RingBuffer length check
Transform per-chunk	`controller.enqueue()` through full RS/WS queue machinery	Direct function call, return value
Per-chunk promises	2+ promises minimum (read + write)	0-1 promise (batch amortized)

GC Pressure Model

Web Streams create many medium-lived objects per pipeline: controllers, readers, writers, strategy objects, and internal queue entries. These objects live for the duration of the stream but are recreated for each new stream instance. The per-chunk promise overhead (2+ promises per chunk from the read/write protocol) generates significant young-generation pressure.

Stream/iter creates few short-lived objects per batch: one iterator result and one promise from the async generator yield. The writeSync fast path eliminates write-side promise creation entirely. Batch amortization means the per-chunk overhead is divided by batch size.

The GC data across all 6 benchmarks:

Benchmark	Web Streams Minor GCs	Iter Minor GCs	Reduction
Pipe-through	30	6	80%
SSR (100 streams)	9	2	78%
Backpressure	9	2	78%
Many streams (10K)	267	35	87%
Deep transforms (5)	59	8	86%
Fan-out (4)	40	8	80%
Average			82%

Summary

Scenario	Time (Iter vs WS)	Memory Winner	GC Winner
Simple pipe-through	4.4x faster	iter (90% less heap)	iter (77% fewer)
SSR concurrent (100 streams)	5.0x faster	iter (73% less heap, no RSS growth)	iter (70% fewer)
Backpressure	Comparable (push); 130x faster (pull)	iter (no RSS growth)	iter (70% fewer)
Many short streams (10K)	3.3x faster	iter (73% less heap)	iter (87% fewer)
Deep transforms (5x)	7.3x faster	iter (net negative heap)	iter (85% fewer)
Fan-out (4 consumers)	3.1x faster	iter (97% less heap)	iter (78% fewer)

Stream/iter outperforms Web Streams on every metric across every scenario tested. The advantages are structural:

No controller/reader/writer object overhead. Web Streams require ReadableStreamDefaultController, ReadableStreamDefaultReader, WritableStreamDefaultController, WritableStreamDefaultWriter, and queuing strategy instances per stream. Iter uses generators and plain objects.
No per-chunk promise tax. Web Streams' pull protocol requires at minimum 2 promises per chunk (reader.read() + writer.write()). Iter's batch model amortizes one promise per batch across all chunks, and the writeSync fast path eliminates the write-side promise entirely.
Transform fusion. Web Streams' pipeThrough creates a full ReadableStream + WritableStream pair per transform. Iter fuses consecutive stateless transforms into a single generator layer regardless of depth.
Shared-buffer fan-out. Web Streams' tee() creates independent stream branches with separate queues. Broadcast shares a single RingBuffer with cursor-based consumers.

The result is an average 82% reduction in GC events, with the gap widest in transform-heavy and high-churn workloads where Web Streams' object-per-stream and promise-per-chunk costs compound.

jasnell · 2026-03-21T00:25:20Z

Ok... with the latest round of test coverage updates, the initial development on this is done. Just waiting for code review.

jasnell requested review from mcollina and ronag March 1, 2026 18:37

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 1, 2026

This comment was marked as off-topic.

Sign in to view

bjohansebas reviewed Mar 1, 2026

View reviewed changes

doc/api/fs.md Outdated Show resolved Hide resolved

ronag approved these changes Mar 2, 2026

View reviewed changes

benjamingr reviewed Mar 2, 2026

View reviewed changes

lib/internal/streams/new/push.js Outdated Show resolved Hide resolved

benjamingr reviewed Mar 2, 2026

View reviewed changes

lib/internal/streams/iter/push.js Show resolved Hide resolved

benjamingr reviewed Mar 2, 2026

View reviewed changes

lib/internal/streams/iter/pull.js Outdated Show resolved Hide resolved

benjamingr reviewed Mar 2, 2026

View reviewed changes

lib/internal/streams/iter/pull.js Show resolved Hide resolved

benjamingr approved these changes Mar 2, 2026

View reviewed changes

jasnell mentioned this pull request Mar 2, 2026

recommend avoiding async generators, they can easily leak resources jasnell/new-streams#8

Open

jasnell force-pushed the jasnell/new-streams-prototype branch from 9f8af01 to e1e1911 Compare March 3, 2026 17:07

jasnell added 5 commits March 17, 2026 18:06

stream: prototype for new stream implementation

fef0c4e

stream: updates to stream/new impl

3da923b

Refactors the cancelation per updates in the design doc

stream: clarify backpressure details in stream_new

d8a7743

stream: fixup sync pull batching in stream/new

9803d11

stream: fixup stream_new.md linting

e428158

jasnell changed the title ~~[DRAFT] stream: prototype for new stream implementation~~ stream: experimental stream/iter implementation Mar 18, 2026

This comment was marked as outdated.

Sign in to view

ronag approved these changes Mar 18, 2026

View reviewed changes

stream: fixup stream/iter benchmarks and test assertion

4e44ece

This comment was marked as outdated.

Sign in to view

jasnell added 2 commits March 18, 2026 19:28

stream: fixup stream/iter tests

84df861

stream: cleanup a few minor details in stream/iter

ecf1053

jasnell added 13 commits March 19, 2026 17:43

stream: apply spec conformance fixes (part 1)

83355dd

stream: apply more stream/iter spec conformance fixes

412ce49

stream: apply more stream/iter conformance fixes

b6c57f0

stream: apply more stream/iter conformance fixes

882b208

stream: apply more stream/iter conformance fixes

dc1542b

stream: apply more stream/iter conformance fixes

dd503d1

stream: update stream/iter benchmarks

9940243

stream: reorder stream/iter tests

46ab61c

stream: make stream/iter transform benchmark better

f857c23

stream: apply multiple stream/iter cleanups

539c824

stream: make additional minor tweaks in stream/iter

ff9bd33

stream: add sab support to stream/iter

ef1f82b

stream: apply more stream/iter cleanups

0cd35f8

stream: expand stream/iter test coverage

31d5558

jasnell mentioned this pull request Mar 21, 2026

fs: complete implementation of FileHandle stream/iter jasnell/node#29

Open

Uh oh!

Conversation

jasnell commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented Mar 1, 2026

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

ronag left a comment

Choose a reason for hiding this comment

Uh oh!

ronag commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ronag commented Mar 2, 2026

Uh oh!

jasnell commented Mar 2, 2026

Uh oh!

benjamingr left a comment

Choose a reason for hiding this comment

Uh oh!

benjamingr Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jasnell Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benjamingr left a comment

Choose a reason for hiding this comment

Uh oh!

jasnell commented Mar 3, 2026

Memory Benchmark Results

Uh oh!

jedwards1211 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ronag commented Mar 3, 2026

Uh oh!

jedwards1211 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jedwards1211 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

jasnell commented Mar 18, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

nodejs-github-bot commented Mar 19, 2026

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jasnell commented Mar 20, 2026

Methodology

Benchmark Results

1. Simple Pipe-Through

2. SSR-Type Concurrent Streams

3. Backpressure

4. Many Short-Lived Streams

5. Deep Transform Chain

6. Fan-Out

Structural Comparison

Per-Stream Construction Cost

Per-Chunk/Per-Batch Hot Path Cost

GC Impact Model

Summary

Uh oh!

jasnell commented Mar 20, 2026

Methodology

Benchmark Results

jasnell commented Mar 1, 2026 •

edited

Loading

ronag commented Mar 2, 2026 •

edited

Loading

jedwards1211 commented Mar 3, 2026 •

edited

Loading

jedwards1211 commented Mar 3, 2026 •

edited

Loading

jedwards1211 commented Mar 3, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading